Homework due this week
Survey:
Don’t circulate anything yet! I’ll be making edits to the class survey and then providing everyone with a new link
At the end of the week, I’ll share some results from the pilot and you’ll be asked to do some basic analyses of the results.
Mid-term exam 1: March 12
The midterm will be online through ELMS.
It will open at 8am and be available until 3pm. Once you start, you’ll have 60 minutes to complete it.
We’ll have a Q & A/Review session on March 10 and there will be no sections the following Friday
A spurious correlation occurs when X and Y appear to be correlated but a third variable (the confounder) actually accounts for both.
Why might we see a correlation between the amount of weekly protest activity and the number of flights to Hawaii?
The spurious correlation is happening because summer days have more protests and more flights to Hawaii because that’s when the weather is nice.
How do we resolve this?
A simple fix here would be dis-aggregation: we split the data up based on the season and then we re-assess the same relationship within each group. This would ensure we were comparing summer flights to summer protests and non-summer flights to non-summer protests.
If the correlation is spurious, then this will make the observed relationship will disappear or potentially even reverse (go from positive to negative or vice-versa)
Voting on the 1964 Civil Rights Act looks contrary to what you might expect.
The bill was proposed by a Democratic president and mostly supported by Democratic leaders in the House and Senate, but more Republican house members voted for it.
| CRA vote | Dem | Rep |
| yes | 153 (63%) | 136 (80%) |
| no | 91 (37%) | 35 (20%) |
The pattern looks different once we consider region:
Effect of party in the north = 95 - 85 = 10
Effect of party in the south = 9 - 0 = 9
The relationship flips direction and shrinks after accounting for region.
North
|
South
|
|||
|---|---|---|---|---|
| CRA vote | Dem | Rep | Dem | Rep |
| yes | 145 (95%) | 136 (85%) | 8 (9%) | 0 (0%) |
| no | 8 (5%) | 24 (15%) | 83 (91%) | 11 (100%) |
The language here tends to throw people off:
Control groups are the baseline group in an experiment. The control group in a medical study would be the people who received the placebo treatment.
Control variables are the confounding variables that we want to address by holding constant for their effects on the outcome.
Randomization and control variables attempt to accomplish the same goals, but in different ways.
Dependent variable(s): the outcome of interest.
Independent variable(s): the main explanatory variable
Control variable(s): additional variables included to account for confounding
There are three potential outcomes we might see when examining a relationship after controlling for another variable:
Spurious: controlling for Z accounts for the entire correlation between X and Y
Additive: Z also impacts Y
Interactive: Z modified the effect of X on Y
In an additive relationship, Z independently influences the outcome, but it doesn’t account for the relationship between X and Y
Examples:
Being Republican and conservative both make people more likely to vote for Republican candidates.
GDP and literacy rates both make countries more likely to be Democratic.
Genetics and environment both impact life expectancy separately.
Additive relationships will look like two roughly parallel lines with the same slope but different intercepts:
Additive relationships are different from confounders. Other variables may matter for the outcome, but they don’t bias our ability to estimate the effect of the IV on the DV. (the slope here is the same with or without the control)
(In practice, two lines may only be approximately parallel)
In an interactive relationship, Z strengthens or weakens the effect of X on Y.
Weight changes the effect of alcoholic drinks on blood alcohol level. (smaller people get drunk with fewer drinks, all else equal)
Issue salience makes policy views more important. (i.e. if a candidate talks a lot about abortion, abortion opinions will matter more for vote choice)
State referenda make state policy more likely to align with public opinion.
Interactive relationships will look like two distinctly non-parallel lines:
For instance: the effect of strong religious views on party ID and voting behavior is different for white and black respondents.
In practice, we will often have more than one IV and more than one control because lots of things can be explanatory.
Your proposed control may be someone else’s main IV. This isn’t a property of a theorized relationship, not an intrinsic feature of any variable.
Our goal isn’t necessarily to account for everything. Accounting for confounders is very important. Accounting for additive relationships is not especially important.
More controls = less data. We’re “splitting” data up by levels of multiple confounder, at a certain point we don’t have enough observations in any one group to say anything useful.
Why are experiments still preferable to mathematical controls?
Randomization can guarantee there’s no confounding
Mathematical controls only works if:
We know what the confounder is
We can measure it
We have enough data to make meaningful comparisons after dis-aggregation
First, get the effect of the IV within each value of the control.
Women
|
Men
|
|||
|---|---|---|---|---|
| Gun Control | Dem | Rep | Dem | Rep |
| Oppose | 301 (20%) | 825 (74%) | 299 (29%) | 962 (83%) |
| Support | 1177 (80%) | 297 (26%) | 716 (71%) | 204 (17%) |
First, get the effect of the IV within each value of the control.
Effect of Democrat vs. Republican for Women: \[ 20 - 74 = -54 \]
Women
|
Men
|
|||
|---|---|---|---|---|
| Gun Control | Dem | Rep | Dem | Rep |
| Oppose | 301 (20%) | 825 (74%) | 299 (29%) | 962 (83%) |
| Support | 1177 (80%) | 297 (26%) | 716 (71%) | 204 (17%) |
First, get the effect of the IV within each value of the control.
Effect of Democrat vs. Republican for Women: \[ 20 - 74 = -54 \]
Effect of Democrat vs. Republican for Men: \[ 29 - 83 = -54 \]
Women
|
Men
|
|||
|---|---|---|---|---|
| Gun Control | Dem | Rep | Dem | Rep |
| Oppose | 301 (20%) | 825 (74%) | 299 (29%) | 962 (83%) |
| Support | 1177 (80%) | 297 (26%) | 716 (71%) | 204 (17%) |
First, get the effect of the IV within each value of the control.
Effect of Democrat vs. Republican for Women: \[ 20 - 74 = -54 \]
Effect of Democrat vs. Republican for Men: \[ 29 - 83 = -54 \]
If they’re different, you can average the two effects to get a rough summary of the impact of the IV on the DV after controlling for “Z”
Women
|
Men
|
|||
|---|---|---|---|---|
| Gun Control | Dem | Rep | Dem | Rep |
| Oppose | 301 (20%) | 825 (74%) | 299 (29%) | 962 (83%) |
| Support | 1177 (80%) | 297 (26%) | 716 (71%) | 204 (17%) |
First, get the effect of the Z within each value of the IV
Effect of Women vs. Men for Democrats \[ 20 - 29 = -9 \]
Women
|
Men
|
|||
|---|---|---|---|---|
| Gun Control | Dem | Rep | Dem | Rep |
| Oppose | 301 (20%) | 825 (74%) | 299 (29%) | 962 (83%) |
| Support | 1177 (80%) | 297 (26%) | 716 (71%) | 204 (17%) |
First, get the effect of the Z within each value of the IV
Effect of Women vs. Men for Democrats \[ 20 - 29 = -9 \]
Effect of Women vs. Men for Republicans \[ 74 - 83 = -9 \]
Women
|
Men
|
|||
|---|---|---|---|---|
| Gun Control | Dem | Rep | Dem | Rep |
| Oppose | 301 (20%) | 825 (74%) | 299 (29%) | 962 (83%) |
| Support | 1177 (80%) | 297 (26%) | 716 (71%) | 204 (17%) |
Partial Effect of party = \[ (-54 + -54) / 2 = -54 \]
Partial Effect of gender = \[ (-9 + -9) / 2 = -9 \]
Women
|
Men
|
|||
|---|---|---|---|---|
| Gun Control | Dem | Rep | Dem | Rep |
| Oppose | 301 (20%) | 825 (74%) | 299 (29%) | 962 (83%) |
| Support | 1177 (80%) | 297 (26%) | 716 (71%) | 204 (17%) |
To characterize the pattern after control, you can ask yourself a series of questions.
Question 1: Does a relationship exist between the IV, DV in at least one value of the control variable?
If not, then the relationship is spurious
If so, then go to the next question
Question 2: is the direction of the relationship between the IV and the DV about the same for all values of the control?
If no, then this is an interaction
If yes, then go to the next question
Question 3: is the strength of the relationship between the IV and the DV the same for all values of the control?
If no, then this is an interaction
If yes, then the relationship is additive
Question 1: Does a relationship exist between the IV, DV in at least one value of the control variable?
Question 2: is the direction of the relationship between the IV and the DV about the same for all values of the control?
Question 3: is the strength of the relationship between the IV and the DV the same for all values of the control?
Women
|
Men
|
|||
|---|---|---|---|---|
| Gun Control | Dem | Rep | Dem | Rep |
| Oppose | 301 (20%) | 825 (74%) | 299 (29%) | 962 (83%) |
| Support | 1177 (80%) | 297 (26%) | 716 (71%) | 204 (17%) |
Dems: \[36 - 24 = 12\]
Independents: \[66 - 80 = -14\]
Republicans: \[85 - 79 = 6\]
Democrats
|
Independents
|
Republicans
|
||||
|---|---|---|---|---|---|---|
| 18-65 | 65+ | 18-65 | 65+ | 18-65 | 65+ | |
| better | 371 (33%) | 165 (53%) | 22 (9%) | 2 (8%) | 38 (4%) | 16 (6%) |
| same | 341 (31%) | 69 (22%) | 62 (25%) | 3 (12%) | 113 (11%) | 43 (15%) |
| worse | 404 (36%) | 75 (24%) | 160 (66%) | 19 (80%) | 849 (85%) | 221 (79%) |
Question 1: Does a relationship exist between the IV, DV in at least one value of the control variable?
Question 2: is the direction of the relationship between the IV and the DV about the same for all values of the control?
Question 3: is the strength of the relationship between the IV and the DV the same for all values of the control?
Democrats
|
Independents
|
Republicans
|
||||
|---|---|---|---|---|---|---|
| 18-65 | 65+ | 18-65 | 65+ | 18-65 | 65+ | |
| better | 371 (33%) | 165 (53%) | 22 (9%) | 2 (8%) | 38 (4%) | 16 (6%) |
| same | 341 (31%) | 69 (22%) | 62 (25%) | 3 (12%) | 113 (11%) | 43 (15%) |
| worse | 404 (36%) | 75 (24%) | 160 (66%) | 19 (80%) | 849 (85%) | 221 (79%) |
For mean comparisons, calculate the difference going from bottom to top (or top to bottom)
Effect of going from Republican to Democrat: \(52.8 - 74.1 = -21.3\)
| Party ID | Mean FT |
|---|---|
| Dem | 74.1 (2865) |
| Rep | 52.8 (2602) |
For controlled mean comparisons, calculate the average effect at each value of the control.
\(57.8 - 77.8 = -20\)
\(53.6 - 73.3 = -19.7\)
\(48.4 - 71.3 = -22.9\)
Mean partial effect of Party ID controlling for age: \(-21\)
| Party ID | 18-39 | 39-59 | 60+ |
|---|---|---|---|
| Dem | 77.8 (959) |
73.3 (866) |
71.3 (914) |
| Rep | 57.8 (666) |
53.6 (887) |
48.4 (923) |
Question 1: Does a relationship exist between the IV, DV in at least one value of the control variable?
Question 2: is the direction of the relationship between the IV and the DV about the same for all values of the control?
Question 3: is the strength of the relationship between the IV and the DV the same for all values of the control?
| Party ID | 18-39 | 39-59 | 60+ |
|---|---|---|---|
| Dem | 77.8 (959) |
73.3 (866) |
71.3 (914) |
| Rep | 57.8 (666) |
53.6 (887) |
48.4 (923) |
| iv | South | Northeast | Midwest | West |
|---|---|---|---|---|
| Biden 20 | 7.6 (4) |
14.4 (9) |
12.8 (4) |
12.4 (8) |
| Trump 20 | 5.9 (12) |
0 (0) |
8.5 (8) |
9.5 (5) |
Use a cross tab when all variables are categorical
Use a mean comparison when the DV is interval and the control and IV are categorical
Consider collapsing some categories to simplify analyses and ensure you have sufficient data. For instance, if a response ranges from Strongly Agree to Strongly Disagree, you might collapse it down to just 2-3 categories.
You can collapse interval data into categories and then use a mean comparison
More often, we’ll use regression analysis:
Identify similar cases at each level of the IV. For instance, find Republicans and Democrats of similar ages.
Drop any cases that can’t be matched.
Re-weight the cases to ensure balance, and then assess the effect.
Measure outcomes for two groups at two different times
If you can assume the trend for both groups is the same, then the difference-in-differences at time 2 is the effect of the treatment.
March 12
Online as an ELMS quiz
Becomes available at 8 AM
Must be finished by 4 PM
Once you start, you’ll have one hour to complete (if you start at 3:30, that only leaves you with 30 minutes, so don’t do that)
All fill in the blank/multiple choice
~30 questions
Covers materials from chapters 1 - 5
I’ll post a list of key concepts this week, and next Monday (3/10) we’ll have a Q & A session to prep